Principal component analysis of flow-volume curves in COPDGene to link spirometry with phenotypes of COPD

Background Parameters from maximal expiratory flow-volume curves (MEFVC) have been linked to CT-based parameters of COPD. However, the association between MEFVC shape and phenotypes like emphysema, small airways disease (SAD) and bronchial wall thickening (BWT) has not been investigated. Research question We analyzed if the shape of MEFVC can be linked to CT-determined emphysema, SAD and BWT in a large cohort of COPDGene participants. Study design and methods In the COPDGene cohort, we used principal component analysis (PCA) to extract patterns from MEFVC shape and performed multiple linear regression to assess the association of these patterns with CT parameters over the COPD spectrum, in mild and moderate-severe COPD. Results Over the entire spectrum, in mild and moderate-severe COPD, principal components of MEFVC were important predictors for the continuous CT parameters. Their contribution to the prediction of emphysema diminished when classical pulmonary function test parameters were added. For SAD, the components remained very strong predictors. The adjusted R2 was higher in moderate-severe COPD, while in mild COPD, the adjusted R2 for all CT outcomes was low; 0.28 for emphysema, 0.21 for SAD and 0.19 for BWT. Interpretation The shape of the maximal expiratory flow-volume curve as analyzed with PCA is not an appropriate screening tool for early disease phenotypes identified by CT scan. However, it contributes to assessing emphysema and SAD in moderate-severe COPD. Supplementary Information The online version contains supplementary material available at 10.1186/s12931-023-02318-4.


Introduction
Chronic obstructive pulmonary disease (COPD) is often diagnosed after significant loss of lung function, as symptoms can remain mild to absent, and are often neglected by patients in the early disease stages. COPD is presumed to start as a smoldering disease, with small airways and parenchymal damage accumulating for many years without being noticed by patients or physicians [1,2]. The ability to identify COPD in the early stage is key in the appropriate management of the disease aimed at improving patient outcomes, as well as reducing overall costs [3]. Spirometry is currently put forward as the most appropriate diagnostic tool, as it is non-invasive, easy to perform, and implementable at low cost. The spirometry diagnosis of COPD is based on a post-bronchodilator forced expiratory volume in one second/forced vital capacity (FEV1/FVC) ratio below the lower limit of the reference population in a clinical context of exposure to noxious particles [4].
A reduced forced expiratory flow between 25 and 75% of FVC (FEF25-75) has been proposed as a sign of small airways disease, in smokers only at risk of developing COPD [5,6]. Moreover, recent large population studies in smoking individuals demonstrate that early pathological changes visualized on CT may also occur in subjects with 'normal' spirometry [7]. Normal, if not only defined by the FEV1/FVC ratio, is outlined by spirometry parameters varying within the range of a healthy non-smoking reference group [8,9]. Even within the range of normality, the shape or contour of the maximal expiratory flowvolume curve (MEFVC) has been of continuous interest [10]. The concavity of the curve, often referred to as the kink, has been associated with emphysema and attributed to airway collapse and loss of elastic recoil [11]. Topalovic et al. proposed the angle of collapse of MEFVC to quantify airway collapse and detecting CT-defined emphysema in heavy smokers [12]. Dominelli et al. quantified the shape of MEFVC with the slope ratio index [13]. Bhatt et al. later proposed the parameter D which describes lung volume as an exponential function of time and the peak index, modeling the number of peaks adjusted for lung size [14,15]. An overview of all indices can be found in the comprehensive review by Hoesterey et al. [16]. In this review, it has been postulated that further analysis on the shape of MEFVC yields the potential to discover parameters that can help detect early airway obstruction [16].
In a large subgroup of the Genetic Epidemiology of COPD study (COPDGene), we used principal component analysis (PCA) to comprehensively characterize the shape of MEFVC and linked the PCA components to CTbased parameters in subjects with mild and moderatesevere airflow obstruction.

Study subjects
We used subjects enrolled in the COPDGene study, which is a large US-based multicenter study including current and former smokers aged 45-80 years (n = 10,198) with at least ten pack-years. Details of the study design have been reported previously [17]. The study was approved by local Institutional Review boards at each of the 21 clinical centers and all subjects provided written documentation of informed consent. The available data included raw spirometry and CT imaging data. For this analysis, we split the subjects on stages of the Global Initiative for Chronic Obstructive Lung Diseases (GOLD) guidelines according to FEV1, FVC and FEV1/ FVC. GOLD I subjects belonged to the mild stage group while GOLD II-III-IV subjects belonged to the moderatesevere stage group.

Spirometry and CT imaging data
Using a standardized protocol [18] and spirometer (NDD EasyOne Spirometer), 9841 participants performed spirometry. Expiratory flow-volume curves and volumetime curves were available. CT scans were obtained at total lung capacity (TLC) and at the end of normal expiration (functional residual capacity, FRC) using multidetector CT scanners. CT densitometry was used to define the presence of emphysema and Small Airways Disease (SAD). Both %emphysema and %gas-trapping were computed using parametric response mapping (PRM) to identify the extent of emphysema (PRM emph ) and functional small airways disease (PRM fSAD ) based on CT scans at TLC and FRC simultaneously [19,20]. Bronchial Wall Thickening (BWT) was assessed by airway wall thickness at an internal perimeter of 10 mm (Pi10). Pi10 was calculated by fitting a linear regression model on all airways of different internal perimeters with the square root of the wall area as dependent variable and perimeter as independent variable. Quantitative parameters of these scans were extracted using Thirona software.

Shape analysis
To focus purely on the shape of MEFVC, we scaled each curve in both axes by 1/FVC for each subject to normalize on FVC and to preserve the shape of the curves. To perform a shape analysis, we applied PCA on the curves (flow over volume datapoints) to extract the most dominant patterns ordered according to the proportion in shape variance they explain. Each MEFVC could then be accurately approximated as a linear combination of these principal components (PC) or patterns with the coefficients describing how much each pattern contributed to the shape of the MEFVC. We computed these coefficients for all subjects in the dataset and linked these to the continuous CT parameters. We denoted the PCs following their order, e.g., the first PC was denoted as PC1. A more extensive description of the PCA computation can be found in the online supplement.

CT-based phenotypes
With the quantitative CT (QCT) values, we defined the presence of emphysema, SAD and BWT using the upper limit of normal (95 th percentile, ULN) cut-offs based on never-smoked normal control subjects in the COPDGene dataset, 107 of such control subjects were enrolled in Phase 1. Based on these cut-offs, we defined eight CT-based phenotypes according to the presence of emphysema and/or SAD and/or BWT. For notation of the phenotypes, emphysema, SAD and BWT were denoted as E, S and B, respectively. A dash was used in the absence of a disease. An overview of all notations can be found in Table 1. We compared the PRM cut-offs with the ULN cut-offs when the %voxels < − 950 Hounsfield Units (Hu) and %voxels < − 856 Hu definitions for emphysema and SAD, respectively, were used on the same never-smoked normal subjects.

Data and statistical analysis
We performed descriptive statistics on demographic, spirometric and CT variables per GOLD stage and per CT-based phenotype. The data is presented as no. (%) or median [Q1-Q3 interquartile range]. Multiple linear regression was used to assess the independent effect of each component in predicting PRM emph , PRM fSAD and Pi10 with adjustment for age, sex, height, weight and pack-years. Standard spirometric parameters (FEV1, FVC, FEV1/FVC, PEF, and FEF25-75) were then added and the standardized β coefficients of the model were used to assess the importance of each predictor. We used the adjusted R 2 , the coefficient of determination, to evaluate the goodness-of-fit of the models. Regression analysis was done over the entire spectrum, in mild COPD (GOLD I) and moderate-severe COPD (GOLD II-III-IV). We compared the principal components with existing MEFVC parameters: angle of collapse [21], area under the forced expiratory flow-volume loop [22], obstructive index [23] and peak index [15]. Statistical analysis was conducted using Python 3 (Python Software Foundation) with the scientific and statistical packages SciPy and StatsModels (open source, scipy.org and statsmodels.org), significance level was set at 0.05.

Population characteristics
Of the 9841 patients that performed spirometry, 9207 (93.6%) had acceptable flow-volume loops according to the American Thoracic Society (ATS)/European Respiratory Society (ERS) guidelines [18]. Subjects with Preserved Ratio Impaired Spirometry (PRISm, FEV1/ FVC > = 0.7 but FEV1 < 80%, n = 1055) were not considered, since other disease factors such as thoracic wall restriction or cardiac disease, being more prevalent in this subgroup, would influence our findings [24,25]. Ultimately, 6302 subjects were used for the analysis. The flow of the eligible subjects for this analysis is described in Additional file 1: Figure S1. The characteristics of the remaining participants per GOLD stage are reported in Table 1. Sixty-seven of 107 never-smoked control subjects had both spirometry and CT data available. In the 6302 subjects used for the analysis, 67 were non-smokers and 6235 were ever smokers. Of those 6235 ever smokers, 3214 were former smokers and 3021 were current smokers.

Principal components
The mean standardized flow-volume curves per GOLD stage are visualized in Fig. 1. The curves were sampled at 200 equidistant points resulting in 200 principal components (full decomposition) with the first ten explaining 78% of the variance in MEFVC shape ( Fig. 2A). With the first 100 components, 98.4% of the variance could be explained. To visualize the influence of the components on MEFVC, we depicted a − 45 to + 45 percent change (5 th to 95 th percentile) of the four most dominant components as compared to the overall mean MEFVC of the population (Fig. 2B). We visually assessed the influence of each of these four components on the MEFVC: PC1 influences PEF and the descending limb without altering the angle of collapse or concavity. PC2 pivots the descending limb around a fixed point, thereby also influencing PEF. PC3 and PC4 mainly model concavity in MEFVC. The remainder of the analyses were done with the first four principal components since more  components did not improve the model fits (adjusted R 2 ) in the following analyses.

CT-Phenotypes
We determined 1.7% for PRM emph , 14.7% for PRM fSAD and 2.2 mm for Pi10 (Fig. 3) as the upper limit of normal (ULN) in a cohort of never-smoked normal controls (n = 67) and considered them as cut-offs for the presence of CT-based abnormalities. Figure 1B shows the mean MEFVC per CT-based phenotype and Fig. 1C the mean MEFVC per number of abnormalities as seen on CT. The characteristics of the subjects per CT-based phenotype are reported in Additional file 1: Table S5. The cut-offs for emphysema and SAD were 5.8 and 18.6% when the classic %voxels < − 950 and %voxels < − 856 definitions for emphysema and SAD were used.

Comparison with other MEFVC-derived parameters
Adjusted R 2 for each parameter per CT outcome and subgroup can be found in Additional file 1: Table S6. Compared to other MEFVC-derived parameters, the  . 3 Percentage emphysema (PRM emph ), percentage gas trapping (PRM fSAD ) and bronchial wall thickening (Pi10) per group (never-smoked normal control subjects; mild COPD (GOLD I); moderate-severe COPD (GOLD II-III-IV)). The cut-offs (dashed lines) are determined by using the 95 th percentile (upper limit of normal) on the control subjects principal components provided a better fit for PRM emph and PRM SAD . For Pi10, area under the forced expiratory flow-volume loop was the superior parameter. Overall, the classical pulmonary function parameters were superior.

Discussion
In 6302 subjects in the COPDGene study, we used principal component analysis (PCA) to extract dominant patterns from the shapes of the MEFVC and explored their association with continuous CT-based parameters and eight CT-defined phenotypes based on cut-offs for emphysema, small airways disease and bronchial wall thickening. The advantage of this PCA analysis is that no hand-engineered features were required to analyze the MEFVC and the large collection of curves was fully exploited in extracting potential patterns. When compared with existing hand-engineered features, the principal components were superior for emphysema and small airways disease and closely matched the area under the MEFVC for bronchial wall thickening.
We found that a small number of components were sufficient to model a large proportion of the variance in shape of MEFVC. Multivariate analysis for the first four principal components showed that 49, 60 and 39 percent of the variance could be explained for emphysema (PRM emph ), small airways disease (PRM fSAD ) and bronchial wall thickening (Pi10), respectively. However, when adding classical pulmonary function tests (FEV1, FVC, FEV1/FVC, PEF and FEF25-75) to the models, independent contributions of the principal components were strongly reduced because of high intra-correlations (Additional file 1: Table S7). For emphysema (PRM emph ), shape-derived components PC1, PC2 and PC3 were still independent contributors. For small airways disease (PRM fSAD ) in mild COPD, PC2 was the third important predictor, whilst it became the most important predictor in moderate-severe COPD for which FEV1, FVC and FEV1/FVC were no longer significant. These findings highlight the impact of small airways disease on PEF, particularly in more advanced disease stages. For bronchial wall thickening, the fit of the regression model was generally very low, indicating that presence of abnormal thickening of the larger airway bronchial walls, did not profoundly affect the shape of the MEFVC.
Interestingly, in multinomial logistic regression modelling the number of CT disease abnormalities present on CT rather the type of CT abnormalities (E, SAD, BWT), the pseudo R 2 was negative and not significant for mild COPD, indicating that in the mild disease stage, any CT abnormalities are unlikely to be detected by the shape of MEFVC or even the standard lung function parameters. Hence, the current data suggest that our initial hypothesis should be rejected, and that early disease processes as identified on CT cannot be predicted by parameters isolated from the relative form of the maximal expiratory flow-volume curve. In particular, FEF25-75, as surrogate marker of small airways disease on spirometry was not predictive in the mild COPD subgroup. It raises the question if in patients with normal spirometry, risk behavior and chronic respiratory symptoms may point to the need of a CT scan, as suggested by Celli et al. [26].
By using the upper limit of normal on the 67 neversmoked normal control subjects in this cohort, we obtained cut-offs for abnormal values of the CT outcomes. With these cut-off values, most of the patients in the mild COPD subgroup had PRM emph and PRM fSAD values within the normal range (Fig. 3). It also demonstrates that mild airflow limitation as diagnosed by an FEV1/FVC below 0.7, can present with CT scans within normal limits. In these individuals, early airway pathology in terminal or respiratory bronchioles may still be present as this is beyond the resolution of conventional CT [27]. An alternative explanation may come from the initial lung function values determined by lung growth, which may result in a lower FEV1/FVC ratio and FEV1 without true pathology on CT.
We normalized the MEFVC curve for FVC to adjust for lung volume and hence also anthropometry and age, and to maximally visualize the changes in shape across the different phenotypes. Next, we calculated the mean MEFVC shape per GOLD stage and per CT-based phenotype. The area under the curve decreases and the concavity or so-called kink in the curve increases as lung function deteriorates. For the CT-based phenotypes, the mean shapes are similar for the phenotypes with only one abnormality [--B, -S-, E--], while for those with two abnormalities [-SB, E-B, ES-], the concavity is larger and the area under the curve smaller, which is to be expected as these are the subjects in the higher GOLD stages. Subjects showing evidence of three abnormalities on CT have the largest concavity and the smallest area under the curve on average. Overall, our findings indicate that concavity of the flow-volume loop is linked to more advanced COPD in which emphysema, but also other radiological phenotypes co-occur.

Interpretation
Our analysis demonstrates that the shape of the maximal expiratory flow-volume curve is not an appropriate screening tool for early disease phenotypes identified by CT scan since neither the principal components and classical pulmonary function parameters were linked with emphysema, small airways disease or bronchial wall thickening as seen on CT. In moderate-severe airflow obstruction (GOLD II-III-IV) the concavity of the curve is mainly related to the presence of emphysema in a combined phenotype with small airways disease, with MEFVC shape parameters having a limited but statistically significant association with CT defined pathologies.  Table S5. Characteristics per CT-based phenotype. Table S6. Linear regression adjusted R 2 s for different parameters derived from maximal expiratory flow-volume curves for emphysema, small airways disease and bronchial wall thickening (PRM emph , PRM fSAD , Pi10 on CT) per subgroup. Table S7. Pearson correlation coefficients between the first four principal components and the classical pulmonary function parameters FEV1, FVC, FEV1/FVC, FEF25-75, PEF. Figure S1. Flow of eligible subjects for this study. CT, computed tomography; COPDGene, Genetic Epidemiology of COPD; PRISm, preserved ratio impaired spirometry.